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Introduction 


-•In 1993, a new coding scheme, called TURBO codes, was pro- 
posed for achieving near capacity performance in the power 
- limited region of the additive white Gaussian noise (AWGN) chan- 
nel. 


_• TURBO codes use a parallel concatenation of rate 1/2 convo- 
lutional encoders combined with iterative maximum a posteriori 
- probability (MAP) decoding to achieve a bit error rate (BER) of 
10 -5 at a signal-to- noise-ratio (SNR) of only 0.7dB. 


-• The channel capacity for a rate 1/2 code with binary phase-shift- 
keyed (BPSK) modulation on the AWGN channel is OdB, and 
- thus the TURBO coding scheme comes within 0.7dB of capacity 
at a BER of 10 -5 . 


-•In this paper, we present some results on the relationship be- 
tween the structure of the TURBO encoder and the resulting 
~ distance spectrum of the code. These results provide an expla- 
nation for the excellent performance of this coding scheme. 
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TURBO Code Performance 



Figure 1: Performance of the TURBO Code. 


The performance curve of the TURBO code has two remarkable 
features: 

1. For low SNR’s the curve is very steep, thus enabling near 
capacity performance at a BER of 10 -5 . 

2. For moderate and high SNR’s, the curve flattens but, re- 
sulting in what has been called the “error floor”. 

The performance of the TURBO code is distinctly different 
from the performance of conventional convolutional codes such 
as the maximal free distance (MFD) (2, 1, 14) code with Viterbi 
decoding. 
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TURBO Encoder 



Figure 2: Block diagram of the TURBO encoder. 


The TURBO code uses two identical rate R = 1/2, constraint 
length 1 / = 4, convolutional encoders in systematic feedback 
form in a “parallel concatenation” configuration. 


Each encoder is punctured to rate R = 2/3. The systematic 
form of the encoders results in each information bit being sent 
only once. Thus, the overall rate of the TURBO encoder is 


Rturbo =• 


2 + 1 + 1 


1 

2 ' 


The pseudorandom interleaver insures, with high probability, 
that the codeword generated by the first encoder in response to 
the input sequence {.t/-} is different than the codeword gener- 
ated by the second encoder in response to the interleaved input 
sequence {x' k }. 
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> For example, the encoder used in the original TURBO code 
has the following generator matrix 


G(D) = 


1 1 + -D 4 

1 i Td+d^Td^+d^ 


• If the input sequence is x(D) = 1 4- D 4- D 2 + D 3 + D 4 , then the 
_ output of the first encoder is [x(D),yi(D)], where yi(D) = 1 4- D A 
is the parity output (ignoring puncturing). 


• Suppose the interleaver maps the input sequence x(D) to the 

sequence x r [D ) = 1 + D 2 + D 3 + D 5 + D 6 , which becomes the input 
to the second encoder. (Note that the weight of the input se- 
quence is not changed by the interleaver.) The output of the 
second encoder is then [x'(D), where y 2 {D), the parity out- 

put, now has infinite weight, since x’{D ) and the denominator 

_ polynomial 1 + D + D 2 + D 2 + D A are relatively prime. 

• In this manner, the pseudorandom interleaver makes it unlikely 
that both encoders will generate low weight parity sequences 
in response to a particular input sequence. 

• It is tempting at this point to conclude that the excellent per- 

- formance of the TURBO code is due to a large free distance. 

However, this turns out not to be the case. 
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'TURBO Decoder 



Figure 3: Block diagram of the TURBO Decoder. 


• The “parallel concatenation” of encoders is iteratively decoded 
by two identical MAP decoders. 

• In order for the iterative decoding technique to be effective, 
soft information must be passed from one decoder to the next. 
This is done using a portion of the log-likelihood ratio, A i(&), 
calculated by the MAP decoding algorithm. 

Each pass through the two decoders counts as one iteration. 

A total of 18 iterations are required to achieve a BER of 10 5 
at an SNR of 0.7dB. 
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Convolutional Codes 

• In order to understand the performance of TURBO codes, it 
is instructive to first consider the performance of convolutional 
codes with finite length input sequences and maximum likeli- 
hood (ML) decoding. 


-• With finite length input sequences of length N , a (2, 1, v) convo- 
lutional code may be viewed as a block code with 2 N codewords 
of length 2(u + N). 


• The information bit error probability of the code is upper bounded 

- b y 

2 n —1 ^ 

p * S £ J7« 

where and d{ are the information weight and total Hamming 
weight, respectively, of the i th codeword. 


( 

d REb \ 

u 

*2 No) 


• Collecting codewords of the same total Hamming weight and 
defining the average information weight per codeword as 


Wd = 


Wd 

w 


where Wd is the total information weight of all codewords of 
weight d and Nd is the number (multiplicity) of codewords of 
weight d, yields 


n< £ HpQ f 

d-d-free ” \ 


d 


RE b \ 


2N ( 


o ) 


N 
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Convolutional Codes (cont.) 


• If a time-invariant convolutional code has N% codewords of 
weight d and length l caused by information sequences x(D) 
whose first one occurs at time 0, then it also has N% codewords 
of weight d and length l caused by information sequences Dx(D ), 
iVj codewords of weight d and length l caused by information 
sequences D 2 x(D), and so on. 


Thus, as the length of the information sequences increases we 
have 

Jim N d -> N° d N, and ^ = N° d 

N—too TV 

and the bound on the BER of a convolutional code with ML 
decoding becomes 

NdWd 


CO 

A< E 


■Q 




d 


RE b 


E N° d w d Q 

ifree 


( 

d REb \ 

u 

2JV 0 ] 


This bound is dominated by the first term for moderate and 
high SNR’s. 


• For this reason, most efforts to find good convolutional codes 
_ have focused on finding convolutional codes that maximize the 
free distance d/ ree and minimize the multiplicity Nj ree for a given 
rate and constraint length. 



April 15, 1996 


TURBO Codes 


8 


The MFD (2,1,14) Code 



SVN, 

Figure 4: Asymptotic performance of the (2, 1, 14) code. 

• The first term in the bound is given by 

- n ( r m,\ 0 n ( M) 

N fret™ free Q (J ifret ^ j Wfree Q free ^ J , 

where wj ree = ru 0 j ree /Nj ree follows from time invariance. (For the 
- MFD (2, 1, 14) code, d free = 18 ,N° free = 33, and w° free = 187.) This 
term is referred to as the free distance asymptote and accurately 
predicts the performance of the code for high SNR’s. 

_» For low SNR’s, however, the performance is much worse than 
the free distance asymptote. 
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The (2,1,14) Code (cont.) 


The performance of the MFD (2,1,14) code for high SNR’s is 
limited by its free distance asymptote. 

» But for low and moderate SNR’s there is a significant gap be- 
tween the asymptote and the actual performance of the code. 
This gap can be explained by examining the distance spectrum 
of the (2, 1, 14) code. 


i The distance spectrum of this code is 


d 

N° d 

w d 

18 

33 

187 

20 

136 

1034 

22 

835 

7857 

24 

4787 

53994 

26 

27941 

361762 

28 

162513 

2374453 

30 

945570 

15452996 

32 

5523544 

99659236 


-• Recall that the total multiplicity Nj — > N%N for large N\ 
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The (2,1,14) Code (cont.) 



Figure 5: Decomposed performance of the MFD (2,1,14) code. 


• By plotting the contribution to the BER of each spectral line, 
_ it is easily seen that for SNR’s less than Ey/No = 2.7 dB the 
performance is not dominated by the free distance asymptote. 

i Instead, the higher distance paths dominate the performance 
- for these SNR’s due to their very large multiplicities. 

-• Thus, the relatively large difference between the real coding 
gain and the asymptotic coding gain of the (2, 1, 14) code is due 
to its very dense distance spectrum. 
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TURBO Codes 


» The performance of the TURBO code with ML decoding is also 
upper bounded by 


2 N -l Wi 

* ■£ v* 





where and di are the information weight and total Hamming 
weight, respectively, of the i th codeword. 


• However, in the TURBO encoder, the pseudorandom inter- 
leaver maps the sequence x(D) to x'(D ) and the sequence Dx(D) 
to a sequence x"(D) that is different from Dx'(D) with very high 
probability. 

» Thus, the input sequences x[D) and Dx(D) produce different 
codewords with different Hamming weights. 

• As the interleaver size N increases, the total multiplicity of 
_ free distance codewords Nf ree (not just those starting at time 

0) approaches a constant that is much less than N. (This is not 
- true for rectangular interleavers!) 

-• Thus, the free distance asymptote for a TURBO code with a 
pseudorandom interleaver is 

N freed) free ( j REfj\ 

—W— Q If' ,m W*) ’ 


lim Nfree « N. 
N—too 


where 




E./N. 

_ Figure 6: Asymptotic performance of the TURBO code. 

_ • The original TURBO code with an N = 65536 pseudorandom 
interleaver has df ree = 6, N/ ree = 3, and Wf ree = 2. 

"• The free distance asymptote for the TURBO code is thus 



Note that the exponent is much smaller than for the (2,1,14) 
code, but the coefficient is also greatly reduced, resulting in a 
much flatter curve. 

-• Comparing the simulation results of the TURBO code with 
the free distance asymptote, it it clear that the “error floor” 
is simply the result of the code approaching its free distance 
asymptote. 
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Spectral Thinning 

In view of the analysis of the (2, 1, 14) code, it is reasonable to 
suggest that the excellent performance of the TURBO code at 
low and moderate SNR’s is due to a relatively “thin” distance 
spectrum. 


That is, the TURBO code is able to follow its free distance 
asymptote at lower SNR’s because the multiplicities of higher 
weight codewords are small enough that the free distance asymp- 
tote remains the dominant term in the bound. 


The distance spectrum of TURBO codes is the result of a pro- 
cess called “spectral thinning” in which the interleaver effec- 
tively moves many lower weight codewords to a higher weight. 


This theory is supported by simulation results and actual cal- 
culations of the distance spectrum of several TURBO codes. 
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Spectral Thinning 



Figure 7: Hypothetical distance spectra of TURBO codes. 


• This figure conceptually depicts the process of spectral thinning 
for three different size pseudorandom interleavers. 

> As the size of the interleaver increases, more low weight code- 
words are moved to higher weights and the distance spectrum 
approaches a binomial distribution with small variance and 
mean close to N. 
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> With an interleaver of length N = 100000, the TURBO code has 
the following distance spectrum: 


d 

N d 

Wd 

6 

8 

16 

8 

22 

54 

10 

41 

157 

12 

150 

323 

14 

721 

1462 


Recall that these are the total multiplicities N d , not just the 
values of N§\ 

In this case, we see that the free distance asymptote remains 
the dominant performance parameter even for low SNR’s. 
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Conclusions 

• The excellent performance of TURBO codes may be explained 
in terms of the distance spectrum of the code. 


• The “error floor” observed in simulations of TURBO codes is 
a manifestation of the free distance asymptote. Since TURBO 
codes have relatively low free distances, the free distance asymp- 
tote dominates performance at moderate and high SNR’s. 


• The “error floor” can be lowered by increasing the size of the 
interleaver (without changing the free distance) or by increas- 
ing dj ree . 

• The exceptional performance of TURBO codes at low SNR’s 
is due to “spectral thinning” and the resultant ability of the 
code to follow its free distance asymptote almost to channel 
capacity. 


• The combination of long block lengths, iV, and low multiplicity, 
N f re e , results in a very small effective multiplicity 

N 1 

N 

compared to convolutional codes, where N e /f = Nf ree > 1- 


N eff = ^« 1 


• The complete distance spectrum has a random-like distribu- 
tion, thereby approximating a long, random block code that 
can be decoded with reasonable complexity. This is consistent 
with Shannon’s noisy channel coding theorem and explains the 
near capacity performance at low SNR’s. 
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Abstract 

The performance of Turbo codes is addressed by examining the code’s distance 
spectrum* The ‘error floor’ that occurs at moderate signal-to-noise ratios is shown 
to be a consequence of the relatively low free distance of the code. It is also 
shown that the ‘error floor’ can be lowered by increasing the size of the interleaver 
without changing the free distance of the code. Alternatively, the free distance of 
the code may be increased by using primitive feedback polynomials. The excellent 
performance of Turbo codes at low signal-to-noise ratios is explained in terms of 
the distance spectrum. The interleaver in the Turbo encoder is shown to reduce 
the number of low weight codewords through a process called ‘spectral thinning’. 

This thinned distance spectrum results in the free distance asymptote being the 
dominant performance parameter for low and moderate signal-to-noise ratios. 
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1 Introduction 


The discovery of Turbo codes and the near capacity performance reported in [1] has 
stimulated a flurry of research efforts to fully understand this new coding scheme [2]-[44]. 
Initially greeted with some skepticism, the original results were independently repro- 
duced by several researchers [6], [7], [ll]~[12], [13], [14], and [31]. Subsequently, recent 
research on Turbo codes has focused on understanding the reasons for their outstanding 
performance [8]— [10], [16], [23]— [25] , [29]. 

At this point, there are two fundamental questions regarding Turbo codes. First, does 
the iterative decoding scheme presented in [1] always converge to the optimum solution? 
Second, assuming optimum or near optimum decoding, why do the Turbo codes perform 
so well? In this paper, we attempt to address the second issue by examining the distance 
spectrum of Turbo codes. In doing so, we will draw on the work of several research groups 
involved with Turbo codes [6]-[10], [13]-[15], [19]-[25]. Due to the intense interest in this 
subject, many results involving Turbo codes have been developed independently by others 
and the reader is encouraged to peruse the references for an alternative point of view. 

The simulated performance of a rate 1/2 Turbo code with the same parameters as 
in [l] is shown in Figure 1 along with simulation results for a rate R — 1/2, memory 
i/ = 14, convolutional code. The comparison of these simulation results raises two issues 
regarding the performance of Turbo codes. First, what is it that allows Turbo codes to 
achieve a bit error rate (BER) of 10“ 5 at a signal-to-noise ratio (SNR) of E b /N 0 = 0.7 
db which is only 0.7dB from the Shannon limit? Second, what causes the error floor , 
that is, the flattening of the performance curve, for moderate to high SNR s? Here, 
we endeavor to explain the performance of Turbo codes, and thus address these two 
issues, in terms of the code’s distance spectrum. We do not attempt to address the many 
interesting questions concerning the iterative decoding method (see, e.g., [27], [43], and 
[44]), i.e., we assume that an optimum or near optimum decoder is available. 

In order to explain their performance in terms of the free distance and the distance 
spectrum, we will examine the codeword structure of Turbo codes in detail. Here, the 
free distance is defined to be the minimum Hamming weight of all possible codewords 
and the error coefficient is the total number, or multiplicity, of free distance codewords. 
The goal is to use specific examples to elucidate the key structural properties that result 
in the near capacity performance of Turbo codes at BER’s around 10 As will be seen, 
this effort leads to an interpretation that applies to Turbo codes and also lends insight 
into designing codes in general. Throughout the paper, Turbo codes are compared to a 
maximum free distance, rate R = 1/2, memory v = 14, i.e., a (2,1,14), convolutional 
code to emphasize the differences in performance and structure. Techniques for analyzing 
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the performance of Turbo codes using transfer functions and the like may be found in 
[6], [10], and [23]. 

The paper begins with a detailed examination of the structure of codewords in a 
Turbo code in Section II. This leads to the calculation of the free distance of a particular 
Turbo code and an explanation for the error floor in its performance curve. In Section III, 
the distance spectrum of Turbo codes is considered and a theory called spectral thinning 
is introduced and used to explain the performance of Turbo codes at low SNR s. The 
idea of spectral thinning is then formalized in Section IV through the use of random 
interleaving. Finally, some conclusions are drawn concerning the distance spectrum of 
Turbo codes and the consequences of this on the design of codes in general. 


2 The Free Distance of Turbo Codes 

In order to find the free distance of a Turbo code, it is necessary to understand the basic 
structure of the encoder and the resulting codewords. A typical Turbo encoder consists of 
the parallel concatenation of two or more, usually identical, rate 1/2 encoders, realized in 
systematic feedback form, and an interleaver. This encoder structure is called a parallel 
concatenation because the two encoders operate on the same set of input bits, rather 
than one encoding the output of the other. A block diagram of a Turbo encoder with 
two constituent convolutional encoders is shown in Figure 2. For the remainder of the 
paper, only Turbo encoders with two identical constituent convolutional encoders are 
considered, though the conclusions are easily extended to the general case. 

The interleaver is used to permute the input bits such that the two encoders are 
operating on the same set of input bits, but different input sequences. Thus, the first 
encoder receives the input bit x x and produces the output pair (x^y}) while the second 
encoder receives the input bit x\ and produces the output pair (x^y 2 ). The input bits are 
grouped into finite length sequences whose length, N y equals the size of the interleaver. 
Since both the encoders are systematic and operate on the same set of input bits, it is 
only necessary to transmit the input bits once and the overall code has rate 1/3. In order 
to increase the overall rate of the code to 1/2, the two parity sequences {t/ 1 } and { j 2 } 
can be punctured by alternately deleting y l and y 2 . We will refer to a Turbo code whose 
constituent encoders have parity check polynomials ho and h\ } expressed in either octal 
or D transform notation, and whose interleaver is of length N, as an (h 0 ,h iy N) Turbo 
code. 

For example, consider the Turbo encoder shown in Figure 2, where each constituent 
encoder is a (2, 1, 2) encoder with parity check polynomials h 0 (D) = l+D 2 and hi(D) = 
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D. For purposes of illustration, assume a pseudorandom interleaver of size N = 16 bits 
which generates a (1 + D 2 , D, 16) Turbo code. The interleaver is realized as a 4x4 matrix 
which is filled sequentially, row by row, with the input bits X{. Once the interleaver 
has been filled, the input sequence to the second encoder, x', is obtained by reading 
the interleaver in a pseudorandom manner until each bit has been read once and only 
once. The pseudorandom nature of the interleaver in this example is represented by a 
permutation n 16 = {6, 14, 4, 7, 11, 8, 3, 5, 9, 13, 0, 2, 12, 1, 10, 15}, which implies x' 0 = x 15 , 
x r l = xio, and so on. 

If the input sequence is x = {xis . . . Xo} = {0, 1,0, 1, 0,0, 0, 0,0, 0, 0, 1, 0, 0, 0, 1} and 
the interleaver is represented by the permutation IIi6, then the input sequence to the 
second encoder is x' = rii 6 (x) = {0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0}. The trellis dia- 
grams for both constituent encoders with these inputs are shown in Figure 3. The cor- 
responding unpunctured parity sequences are y 1 = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0} 
and y 2 = {1,1, 0,0, 0,0, 0,0, 0,0, 0,1, 0,0, 0,0}. The resulting codeword has Hamming 
weight d = w(x) + u/(j/ 1 ) + w(y 2 ) = 4 + 3 + 3 = 10 without puncturing, where w(x) is the 
Hamming weight of the sequence x. If the code is punctured beginning with t/q, then the 
resulting codeword has weight d = 4 + 3 + 2 = 9. If, on the other hand, the puncturing 
begins with t/q, then the punctured codeword has Hamming weight 4 + 1 + 1 =6. 

Finding the free distance of a Turbo code is complicated by the fact that Turbo en- 
coders are time-varying due to the interleaver. That is, if x = Dx, where D is the delay 
operator [45], then y 1 = Dy l , but x' ^ Dx' (with high probability) and y 2 ^ Dy 2 . (Here, 
we only consider delays of a finite length sequence x for which no ones are lost.) Con- 
tinuing the example, if x = Dx, then x' = n^x) = {0, 0, 0, 0,0, 0, 0, 1, 0, 1, 0, 0, 0, 1,0, 1} 
and y 2 = {0, 0, 0, 0, 0, 0, 0, 0, 1,0, 0, 0, 0, 0, 1, 0}. Thus, time shifting the input bits results 
in codewords that differ in both bit position and overall Hamming weight! The variation 
in the weights of codewords corresponding to time shifted input sequences is magnified 
by puncturing. 

This simple example illustrates several salient points concerning the structure of the 
codewords. First, because the pseudorandom interleaver permutes the input bits, the two 
input sequences x and x 7 are almost always different, though of the same weight, and the 
two encoders will (with high probability) produce parity sequences of different weights. 
Second, it is easily seen that a codeword may consist of a number of distinct error events 
in each encoder, where an error event is a path in the trellis that diverges from the all zero 
state and then remerges with the all zero state within a finite number of branches. Note 
that since the constituent encoders are realized in systematic feedback form a nonzero 
sequence is required to return the encoder to the all zero state. Thus, since at least 
one nonzero bit is required to start the error event, all error events are associated with 
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information sequences of weight 2 or greater [9]. Finally, with a pseudorandom interleaver 
it is highly unlikely that both encoders will be returned to the all zero state at the end 
of the codeword even when the last v bits of the input sequence x are used to force the 
first encoder back to the all zero state. 

If neither encoder is forced back to the all zero state, i.e., no tail is used, then the 
sequence consisting of N — 1 zeroes followed by a one is a valid input sequence x to 
the first encoder. For some interleavers, this x will be permuted to itself and x* will be 
the same sequence. In this case, with puncturing, the weight of the codeword and the 
free distance of the code will be at most two. Note that this codeword is caused by an 
information sequence of weight one! Thus, forcing the first encoder to return to the all 
zero state insures that every information sequence has at least weight two and eliminates 
the possibility of a weight two codeword. For this reason, it is common to assume that 
the first encoder is forced to return to the all zero state. 

The ambiguity of the final state of the second encoder has been shown by simulation 
to result in negligible performance degradation [13], [31] for large interleavers. For these 
reasons, it will be assumed for the remainder of the paper that the first encoder is forced 
to return to the all zero state and that the final state of the second encoder is unknown. 
Special interleaver structures that result in both encoders returning to the all zero state 
are discussed in [32], [33], [35], [36] and [37]. 


2.1 Performance Bounds 

In order to make clear the distinction between Turbo codes and convolutional codes, it 
is useful to consider these codes as block codes. To this end, the input sequences are 
restricted to length N, where N corresponds to the size of the interleaver in the Turbo 
encoder. With finite length input sequences of length AT, a (2, 1,^) convolutional code 
may be viewed as a block code with 2 N codewords of length + N). 

The bit error rate (BER) performance of a convolutional code with maximum likeli- 
hood (ML) decoding on an additive white Gaussian noise (AWGN) channel with an SNR 
of Eb/No can be upper bounded using a union bound technique by [13] 


n<E^o 
»=1 ^ 



( 1 ) 


where W{ and are the information weight and total Hamming weight, respectively, of 
the i th codeword. Collecting codewords of the same total Hamming weight and defining 
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the average information weight per codeword as 

W d 

Wi = w d - 

where W d is the total information weight of all codewords of weight d and N d is the total 
number, or multiplicity, of codewords of weight d , yields 


P„< 


2(*/+tf) 

E 


d = dfree 


NdWd n 
N W 



( 2 ) 


where df ree is the free distance of the code. (In this development, the multiplicity 
includes codewords due to multiple error events for d > 2 dj ree ). 

If a convolutional code has codewords of weight d caused by information sequences 

x whose first one occurs at time 0, then it also has N% codewords of weight d caused 
by the information sequences Dx, N% codewords of weight d caused by the information 
sequences D 2 x, and so on. Thus, as the length of the information sequences increases, 
we have 

lim ^ = N° d 

N-i<X> jV 


and 


lim w d = 

N-> oo 


W d 

lim — 

N-+QQ Nd 


W[ 

NS 


^ . 7.0 


= w. 


d’ 


where W d is the total information weight of all codewords with weight d which are caused 
by information sequences whose first one occurs at time 0. Thus, the bound on the BER 
of a convolutional code with ML decoding becomes 


2 (u+N) 

< E 

<fc=d/ ree 


, 2REA = 2( S. 1V) 


W°Q 


(m 


( 3 ) 


which is the standard union bound for ML decoding. For this reason, efforts to find 
good convolutional codes for use with ML decoders have focused on finding codes that 
maximize the free distance dj Tet and minimize the number of free distance paths Nj ree 
for a given rate and constraint length. 

The performance of a Turbo code with maximum likelihood decoding can also be 
bounded using the union bound of equation (2). However, in the Turbo encoder the 
pseudorandom interleaver maps the input sequence x to x 1 and the input sequence Dx to 
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a sequence a/ # that is different from Dx! with high probability. Thus, unlike convolutional 
codes, the input sequences x and Dx produce different codewords with different Hamming 
weights. For Turbo codes with pseudorandom interleavers, NdWd is much less than N 
for low weight codewords. This is due to the pseudorandom interleaver which, with 
high probability, maps low weight parity sequences in the first constituent encoder to 
high weight parity sequences in the second constituent encoder. Thus, for low weight 
codewords 

WdNd 


N 


« 1 , 


where 


Nd 

N 


( 4 ) 


is called the effective multiplicity of codewords of weight d. The effect of the interleaver 
size on the multiplicity is also reported in [7], [8], and [9]. 


2.2 Asymptotic Performance 

For moderate and high signal-to-noise ratios, it is well known that the free distance term 
in the union bound on the bit error rate performance dominates the bound [45]. Thus, 
for Turbo codes the asymptotic performance approaches 


T-> N free™ free ^ 

n ~ rr — V 



( 5 ) 


where N/ Tee is the error coefficient and wj ree is the average weight of the information 
sequences causing free distance codewords. The expression on the right hand side of 
equation (5) and its associated graph is called the free distance asymptote, P f Tec , of a 
Turbo code. 

An algorithm for finding the free distance of Turbo codes is described in [29]. This 
algorithm was applied to a Turbo code with the same constituent encoders, puncturing 
pattern, and interleaver size N as in [1] and a particular pseudorandom interleaving 
pattern. The parity check polynomials for this code are ho = D 4 + D 3 + D 2 + D + l 
and hi = D 4 + 1, or h 0 = 37 and hi = 21 using octal notation. This (37,21,65536) 
code was found to have Nf ree = 3 paths with weight df ree = 6. Each of these paths was 
caused by an input sequence of weight 2 and thus Wf ree = 2. Though this result was 
for a particular pseudorandom interleaver, it is true for most pseudorandom interleavers 
with N — 65536. This is consistent with the conclusions in [6] in which the performance 
of Turbo codes is averaged over all possible pseudorandom interleavers. 
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For this particular Turbo code, the free distance asymptote is given by 


P ’ TC ' ~ 65536° (\HvD ' 

where the rate loss due to the addition of a 4-bit tail is ignored and 

^ free _ 3 

N ~ 65536 

is the effective multiplicity. The free distance asymptote is shown plotted in Figure 4 
along with simulation results for this code using the iterative decoding algorithm of [1] 
with 18 iterations. From Figure 4, it can clearly be seen that the simulation results do 
in fact approach the free distance asymptote for moderate and high SNR s. Since the 
slope of the asymptote is essentially determined by the free distance of the code, it can 
be concluded that the “error floor’ observed with Turbo codes is due to the fact that 
they have a relatively small free distance and consequently a relatively flat free distance 
asymptote. 

Further examination of equation (5) reveals that the manifestation of the error floor 
can be manipulated in two ways. First, increasing the length of the interleaver while 
preserving the free distance and the error coefficient will lower the asymptote without 
changing its slope by reducing the effective multiplicity. In this case, the performance 
curve of Turbo codes does not flatten out until higher SNR’s and lower BER s are reached. 
Conversely, decreasing the interleaver size while maintaining the free distance and error 
coefficient results in the error floor being raised and the performance curve flattens at 
lower SNR’s and higher BER’s. This can be seen in the simulation results shown in 
Figure 5 for the (37,21 ,N) Turbo code with varying N. If the first constituent encoder 
is not forced to return to the all zero state and the weight 2 codewords mentioned earlier 
are allowed, then the error floor is raised to the extent that the code performs poorly 
even for large interleavers. Thus, one cannot completely disregard free distance when 
constructing Turbo codes. 

If the size of the interleaver is fixed, then the “error floor” can be modified by in- 
creasing the free distance of the code while preserving the error coefficient. This has the 
effect of changing the slope of the free distance asymptote. That is, increasing the free 
distance increases the slope of the asymptote and decreasing the free distance decreases 
the slope of the asymptote. It has been shown in [8], [25], [29] and [30] that for a fixed 
interleaver size, choosing the feedback polynomial to be a primitive polynomial results 
in an increased free distance and thus a steeper asymptote. An argument to support the 
use of primitive polynomials in Turbo codes is presented in section 4. 
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2.3 Comparison to the (2,1)14) Code 

The role that the free distance and effective multiplicity play in determining the asymp- 
totic performance of a Turbo code is further clarified by examining the asymptotic per- 
formance of a convolutional code. The free distance asymptote of a convolutional code is 
given by the first term in the union bound of equation (3). The maximum free distance 
(2,1,14) code whose performance is shown in Figure 4 has dj ree = 18, Nj ree = 18, and 
Wj ree = 137 [46]. Thus, the free distance asymptote for this code is 

P free = 137 Q ^18—^ , 
which is also shown in Figure 4. 

As expected, the free distance asymptote of the (2, 1, 14) code is much steeper than the 
free distance asymptote of the Turbo code due to the increased free distance. However, 
because the effective multiplicity of the free distance codewords, given by (4), of the Turbo 
code is much smaller than the multiplicity of the (2, 1, 14) code, the two asymptotes do 
not cross until an E^/Nq = 3.5 dB. At this E^/Nq , the BER of both codes is less than 
10“ 7 which is lower than the targeted BER of many practical systems. Thus, even though 
the (2,1,14) convolutional code is asymptotically better than the (37,21,65536) Turbo 
code, the Turbo code is better for the error rates at which many systems operate. 


2.4 A Turbo Code with a Rectangular Interleaver 

To emphasize the importance of using a pseudorandom interleaver with Turbo codes, we 
now consider a Turbo code with a rectangular interleaver. Turbo codes with rectangular 
interleavers have also been considered in [2] where the effect of the interleaver on the 
free distance of the code is discussed. The same constituent encoders and puncturing 
pattern as in [1] are used in conjunction with a 120 x 120 rectangular interleaver. This 
rectangular interleaver is realized as a 120 x 120 matrix into which the information 
sequence x is written row by row. The input sequence to the second encoder x! is then 
obtained by reading the matrix column by column. A 120 x 120 rectangular interleaver 
implies an interleaver size of N = 14400 and thus this is a (37,21,14400) Turbo code. 

Using the algorithm described in [29], this code was found to have a free distance of 
d free = 12 with a multiplicity of Nj ree = 28,900. For this code, each of the free distance 
paths is caused by an information sequence of weight 4, so Wf ree = 4. The free distance 
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asymptote for this code is thus given by 


Pfree = 


28900 x 4 
14400 


(m 


The free distance asymptote is plotted in Figure 6 along with simulation results using the 
iterative decoding algorithm of [1] with 18 iterations. This figure clearly shows that the 
free distance asymptote accurately estimates the performance of the code for moderate 
and high Eb/No's. 

This code achieves a bit error rate of 10 -5 at an Eb/No of 2.7dB and thus performs 2 dB 
worse than the (37,21,65536) Turbo code with a pseudorandom interleaver even though 
it has a much larger free distance. The relatively poor performance of the (37,21, 14400) 
Turbo code with a rectangular interleaver is due to the large multiplicity of d/ ree paths. 
This results in an effective multiplicity of 


Nfree _ 28900 ^ ^ 

N 14400 ~ 

which is much larger than the effective multiplicity of the (37,21,65536) Turbo code. We 
now show that the large multiplicity is a direct consequence of the use of the rectangular 
interleaver and that, furthermore, increasing the size of the interleaver does not result in 
a significant reduction in the effective multiplicity of the free distance codewords. 

The free distance paths in the Turbo code with the rectangular interleaver are due to 
four basic information sequences of weight 4. These information sequences are depicted 
in Figure 7 as they would appear in the rectangular interleaver. The “square” sequence 
in Figure 7a depicts the sequence x = 1,0, 0,0, 0, 1,0594, 1,0,0, 0,0, 1,0^, where O594 
denotes a sequence of 594 consecutive zeroes and 0oo represents a sequence of zeroes 
that continues to the end of the information sequence. In this case, the rectangular 
interleaver maps the sequence x to itself and therefore x! = x. The sequence x results 
in a parity sequence y x from the first constituent encoder which, after puncturing, has 
weight 4. Similarly, the input sequence x 1 = x results in a parity sequence y 2 from 
the second constituent encoder which, after puncturing, also has weight 4. The weight 
of the codeword is then d/ ree = 4 + 4 + 4 = 12. By counting the number of distinct 
positions in which these sequences can appear in the rectangular interleaver, we can find 
the multiplicity of the free distance codewords. Since the “square” sequence in Figure 
7a can appear in {\/N - 5) x (y/W - 5) = 13,225 distinct positions in the rectangular 
interleaver, and in each case x' = x and a codeword of weight dj Tte = 12 results, this 
results in 13,325 free distance codewords. Note that for every occurrence of the “square” 
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sequence to result in a codeword of weight 12 the weight of both parity sequences must 
be invariant to which is punctured first. 

The “rectangular” sequences in Figures 7b and 7c also result in weight 12 codewords. 
For these two sequences, the weight of one of the parity sequences is affected by whether 
or not it is punctured first and only every other position in which the “rectangular” 
sequences appear in the interleaver results in a codeword of weight df ree = 12. Thus, the 
sequences in Figure 7b and Figure 7c each result in 0.5 (a/ 7V — 10) x (vN — 5) = 6, 325 free 
distance codewords. For the sequence in Figure 7d, the weight of both parity sequences 
is affected by which is punctured first and only one out of four positions in which the 
“rectangular” sequence appears in the interleaver results in a codeword of weight dj ree = 
12. Consequently, this sequence results in 0.25 (\/jV — 10) x (y/N — 10) = 3,025 free 
distance codewords. Summing the contributions of each type of sequence results in a 
total of iV/ree = 28,900 codewords of weight df ree = 12. 

It is tempting to try to improve the performance of a Turbo code with a rectangular 
interleaver by increasing the size of the interleaver. However, all of the information 
sequences shown in Figure 7 would still occur in a larger rectangular interleaver, so 
the free distance cannot be increased by increasing N. Also, since the number of free 
distance codewords is on the order of N } increasing the size of the interleaver results 
in a corresponding increase in Nf ree such that the effective multiplicity N/ ree fN does 
not change significantly. Without the benefit of a reduced effective multiplicity, the 
free distance asymptote, and thus the “error floor”, of Turbo codes with rectangular 
interleavers is not lowered enough for them to manifest the excellent performance of 
Turbo codes with pseudorandom interleavers for moderate BER’s. Attempts to design 
interleavers for Turbo codes generally introduce structure to the interleaver and thus 
destroy the very randomness that results in such excellent performance at low SNR s. 

3 The Distance Spectrum of Turbo Codes 

In the previous section, it was shown that the “error floor” observed in the performance of 
Turbo codes is due to their relatively low free distance. It is now shown that the outstand- 
ing performance of Turbo codes at low SNR’s is a manifestation of the sparse distance 
spectrum that results when a pseudorandom interleaver is used in a parallel concate- 
nation scheme. To illustrate this the distance spectrum of an “average” (37,21,65536) 
Turbo code is found and its relationship to the performance of the code is discussed. 
The distance spectrum of the “average” Turbo code is then compared to the distance 
spectrum of the (2, 1, 14) code. An “average” Turbo code is one whose properties have 
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been averaged over all possible pseudorandom interleavers [8]. By doing this, the analysis 
of Turbo codes and the algorithm for finding the distance spectrum are simplified. 

Using the algorithm described in [29], an “average” (37,21,65536) Turbo code was 
found to have the following distance spectrum 

d Ng W± 

6 3 6 

8 22 54 

10 41 157 

12 150 323 

where Nj is the total number of codewords of weight d and = Njwa is the total 
information weight of all codewords of weight d. The distance spectrum information 
for a particular distance d is referred to as a spectral line. This data can be used in 
conjunction with the bound of equation (2) to estimate the performance of the code. In 
addition, by plotting each term of equation (2) the contribution of each spectral line to 
the overall performance of the code can be estimated. 

The performance of a (37,21,65536) Turbo code is shown in Figure 8 along with 
curves showing the contribution of each spectral line for an “average” Turbo code with 
the same interleaver length. This clearly shows that the contribution to the code s BER 
by the higher distance spectral lines is less than the contribution of the free distance 
term for E^/Nq's greater than 0.5 dB. Thus, the free distance asymptote dominates the 
performance of the code not only for moderate and high E^/Nq's, but also for low E^/Nq s . 
We characterize distance spectra for which this is true as sparse or spectrally thin. 

3.1 Comparison to the (2,1,14) Code 

The ramifications of a sparse distance spectrum are made evident by examining the 
distance spectrum of convolutional codes. The (2,1,14) convolutional code introduced 
in section 2, has the following distance spectrum 
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as reported in [46]. When comparing the distance spectrum of a convolutional code and 
a Turbo code, it is important to remember that for a convolutional code iV rf ~ N x 
for the low weight codewords. Figure 9 shows the estimated performance of this code 
using the bound of equation (3) and the contribution of each spectral line. 

In this case, the contribution of the higher distance spectral lines to the overall BER 
is greater than the contribution of the free distance term for Eb/No's less than 2.7 dB, 
which corresponds to BER’s of less than 10 -6 ! The large SNR required for the free 
distance asymptote to dominate the performance of the (2, 1, 14) code is due to the rapid 
increase in the path multiplicity for increasing d. We characterize distance spectra for 
which this is true as spectrally dense. The dense distance spectrum of convolutional 
codes also accounts for the discrepancy between the real coding gain at a particular SNR 
and the asymptotic coding gain calculated using just the free distance [45]. 

Thus, it can be concluded that the outstanding performance of Turbo codes at low 
signal-to-noise ratios is a result of the dominance of the free distance asymptote, which 
in turn is a consequence of the sparse distance spectrum of Turbo codes, as opposed 
to spectrally dense convolutional codes. Finally, the sparse distance spectrum of Turbo 
codes is due to the structure of the codewords in a parallel concatenation and the use of 
pseudorandom interleaving. 


4 Spectral Thinning 

In this section, the observations made concerning the distance spectrum and spectral 
thinning of Turbo codes is formalized from the point of view of random interleaving. 
Random interleaving was introduced in [6] — [1 0] to develop transfer function bounds on the 
average performance of Turbo codes and to explore issues of code design. Here, random 
interleaving is used to explore the effect of the interleaver on the distance spectrum of 
the code. In order to simplify the notation and discussion, only nonpunctured Turbo 
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codes are considered explicitly. The extension to punctured codes is straightforward and 
may be found in [29]. 

The fundamental idea of random interleaving is to consider the performance of a Turbo 
code averaged over all possible pseudorandom interleavers of a given length. For a given 
N } there are N\ possible pseudorandom interleavers and, assuming a uniform distribution, 
each occurs with probability ^7. Let a particular interleaver map an information sequence 
x of weight w to an information sequence x 7 , also of weight w. Then there are a total 
of wl(N — w)\ interleavers in the ensemble of N\ interleavers that perform this same 
mapping. Thus, the probability that such a mapping and, hence, that the codeword that 
results from the input sequences x and x f occurs is 

w\{N -w)\ _ 1 

"■ '(:)■ 

Following [8], define the input redundancy weight enumerating function (IRWEF) of 
a systematic code as 

= ( 6 ) 

W Z 

where A WyZ is the number of codewords of weight d = w + z generated by input sequences 
of weight w and parity sequences of weight z. The goal is now to develop a relationship 
between the codewords in the constituent encoders and A WyZ for the Turbo code and to 
see how that relationship changes with the size of the interleaver. 

Recall from section 2 that a Turbo codeword is essentially the combination of a 
codeword from the first constituent encoder plus a codeword from the second constituent 
encoder. A codeword of weight d\ = w + z\ from the first constituent encoder caused 
by an information sequence x of weight w is composed of Ti\ error events of total length 
l lm To avoid difficulties in counting codewords when the second constituent encoder is 
left unterminated, we consider different orderings of the same error events as distinct 
cases. The ordered set of n\ error events in the first encoder is denoted by S\. The 
information sequence x that results in the set Si is mapped by a particular interleaver 
to the information sequence x 7 , also of weight w, which is then encoded by the second 
constituent encoder. This results in a codeword of weight d 2 = w + z 2 , with the ordered 
set S 2 consisting of n 2 error events of total length / 2 - 

For example, Figure 3 depicts a codeword of weight d\ = 4 4- 3 = 7 in the first 
constituent encoder caused by an information sequence of weight w = 4 and composed of 
rii = 2 error events of total length l\ = 5 + 3 = 8. For the interleaver described in section 
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2, x is mapped to an x 7 that results in a codeword of weight d 2 = 4 + 3 = 7 in the second 
constituent encoder consisting of n 2 = 2 error events of total length l 2 = 3 + 3 = 6. Thus, 
this Si and S 2 result in a codeword of weight 10 and a single contribution to A|, 6 of the 
Turbo code for this particular interleaver. Averaged over the ensemble of interleavers of 
length N, a set S\ with information weight w and parity weight z x and a set Si with 
information weight w and parity weight z 2 will contribute a fraction 



( 7 ) 


to the enumerating function coefficients .4^,,.,+^ of an “average” Turbo code. 

Because the sequence of zeroes connecting any two distinct error events has no effect 
on the weight of the information sequence or the parity sequence, there are 


^ N - h + n x ^ (g) 

ways that the ordered set, Si, of rii error events can be arranged such that their contri- 
bution to A w ^ zi + Z2 is not changed. This is simply the number of ways in which rii distinct 
error events can be arranged in a sequence of length N while maintaining the order in 
which they appear. Similarly, if the codeword in the second constituent encoder ends in 
the all zero state, then the ordered set S 2 will make 


^ N - l 2 + n 2 j ^ 9 ) 

contributions to 4,.,+..,. However, because the second encoder is not guaranteed to 
return to the all zero state, it is possible that the last of the n 2 error events is not 
actually an error event, but instead ends in a nonzero state. In this case, the last error 
event cannot be moved and the set S 2 makes 


( N-l 2 + (n 2 - 1) \ 

V ("2 - 1 ) ) 


( 10 ) 


contributions to Au, ir]+r2 . 

The contribution to the distance spectrum of a Turbo code due to any pair of ordered 
sets Si and S 2 , averaged over the ensemble of pseudorandom interleavers, can now be 
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computed using equations (7), (8), (9), and (10). If the codeword in the second con- 
stituent encoder happens to end in the all zero state, then the contribution of some Si 
and S 2 to A w>Zl + Z2 is given by 


^ N — li + ni ^ ^ N — I 2 + n 2 ^ 
" (") ' 


( 11 ) 


If the codeword in the second constituent encoder does not end in the all zero state, then 
the contribution to is given by 


N — l\ + Tli \ f AT — / 2 + { n 2 ~ 1) 
Til l V ( n 2 ~ 1 ) 

m 


( 12 ) 


Equations (11) and (12) can now be used to explore the effect of changing the interleaver 
size on the distance spectrum of an “average” Turbo code. 

Since we are primarily concerned with low weight codewords in the distance spectrum, 
we assume that N ni,n 2 ,/i, and 1 2 . If this is not true, then Si and S 2 either contain 
a very large number of short error events or a few very long error events. In both cases, 
it is very unlikely that the result is a codeword of low weight. With this assumption, 
equation (11) can be approximated by 


w\ 


ni!n 2 ! 


N m 


1 -f n 2 -tu 


(13) 


where, without loss of generality, it is assumed that rii > n 2 . Since each error event is 
caused by an information sequence of weight at least two, w > 2ni. The behavior of 
equation (13) for increasing N can be broken down into three cases: 


1. Til > Tl2 

The exponent of N is strictly negative and the contribution to A w ^ 1+Z2 decreases 
as N increases. 

2. ni = n 2 and w > 2ni 

The exponent of N is strictly negative and the contribution to A w ^ Zl + Z2 decreases 
as N increases. 
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3. n i = n 2 and w = 2ni 

The exponent of N is exactly zero and the contribution to A WyZ2 + Z2 converges to a 
finite value as N increases. 


For N > ni,n 2 ,/i, and J 2 , equation (12) can be approximated by 


wl 


ni!(n 2 - 1)! 






(14) 


where w > 2ni- However, since the tail in the second encoder may be caused by an 
information sequence of weight 1, we also have w > 2n 2 — 1. The behavior of equation 
(14) for increasing N can be broken down into three cases: 

1. Tli > 7l 2 

Since w > 2n x , the exponent of N is strictly negative and the contribution to 
A w ^ +Z2 decreases as N increases. 


2. n x = 7*2 

Again, since w >2ni, the exponent of N is strictly negative and the contribution 
to A WtZl + Z2 decreases as N increases. 


3. Tli ^ 7l 2 

Since w > 2n 2 - 1, the exponent of N is strictly negative and the contribution to 
Au,, ri + Z2 decreases as N increases. 

The following lemma can now be stated. A similar result was proven in [10]. 

Lemma 1 Given a Turbo code based on two systematic feedback encoders and a pseu- 
dorandom interleaver of length N in which the first encoder is assumed to be forced back 
to the all zero state, the contribution of two ordered sets of error events S x and S 2 with 
the same information weight to the distance spectrum of the Turbo code, averaged over 
all pseudorandom interleavers of length N, converges to a nonzero constant as N — > oo, 
if and only if, 


1. S 2 leaves the second encoder in the all zero state. 

2. Si and 5 2 contain the same number of error events. 

3. Each error event in S x and S? is caused by a weight two information sequence. 


In all other cases, the contribution goes to zero as N — > oo. 
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For each term A w ^ z in the input redundancy weight enumerating function of a Turbo 
code there is a finite number of pairs of sets S\ and 52 that contribute to it. If either set 
contains a long error event, then it is possible that that pair will be excluded for small 
interleavers. As the interleaver size increases, eventually all pairs of sets will be allowed 
and any further increase in N will not result in additional pairs of sets contributing to 
A WjZ . Also, as N oo, A w , z will be determined by pairs of S\ and S 2 that satisfy the 
three conditions of Lemma 1 and thus each A WyZ will converge to a finite value. Since 
each spectral line is a finite sum of A WiZ terms, each spectral line converges to a finite 
value as the interleaver size increases. 

The convergence of each spectral line to a finite value as the size of the interleaver 
increases results in spectral thinning. That is, for small interleavers there may be pairs 
of sets Si and 52 that do not satisfy the conditions of Lemma 1, but which contribute to 
the multiplicity of a low weight spectral line. As the size of the interleaver increases the 
number of these aberrational sets decreases until the spectral line reaches it final value as 
determined by Lemma 1. This process of spectral thinning is represented graphically in 
Figure 10 which depicts the thinning of low weight codewords as the size of the interleaver 
increases for hypothetical distance spectra. It is this thinning of the distance spectrum 
that enables the free distance asymptote of a Turbo code to dominate the performance 
for low SNR and thus to achieve near capacity performance. 

4.1 Primitive Polynomials and Free Distance 

We now consider the ramifications of Lemma 1 with respect to the free distance codewords 
of a Turbo code. That is, what does Lemma 1 imply about the information sequences 
that generate the free distance codewords in an “average” Turbo code? 

For an “average” Turbo code, Lemma 1 states that as the size of the interleaver 
increases each spectral line is the result of contributions only from pairs of ordered sets 
of error events in which each error event is caused by a weight two information sequence. 
It is reasonable to expect that the free distance spectral line will be among the first 
spectral lines to converge to its final value. Thus, for reasonably large interleavers the 
free distance will be determined by the sets 5i and 52 satisfying the conditions of Lemma 
1. Let S\ and s 2 be any pair of error events caused by a weight two information sequence 
that results in a minimum weight parity sequence in the first and second constituent 
encoders, respectively. Note that there may be more than one such pair of minimum 
weight error events for the constituent encoders. 

A free distance codeword in an “average” Turbo code must be the result of sets S\ 
and 5 2 that consist of only those minimum weight error events Si and s 2) respectively. 
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Furthermore, since each additional error event in either Si and S 2 adds weight to the 
codeword, Si and S 2 must each contain only one minimum weight error event. (If each 
error event does not add weight, then a weight two information sequence exists that 
generates a zero weight parity sequence and the free distance of the code would be 2.) 
Thus, the free distance codewords of an “average” Turbo code are caused by weight two 
information sequences, provided the interleaver is large enough. 

Therefore we have the following Lemma. 

Lemma 2 For an “ average ” Turbo code, as the size of the interleaver N approaches 00 .* 

1. The free distance codewords are caused by information sequences of weight 2. 

2. The free distance of an “average” Turbo code is maximized by choosing constituent 
encoders that have the largest output weight for weight two information sequences. 

Based on this lemma, we now present an intuitive argument for why choosing the feedback 
polynomial h 0 (D) in a (2, 1, v) systematic feedback encoder to be a primitive polynomial 
maximizes the output weight for weight two information sequences. It follows that the 
free distance of an “average” Turbo code is maximized by using a primitive polynomial 
as the feedback polynomial in the constituent encoders. This result was independently 
derived in [10] using the transfer function of an “average” Turbo code. 

The generator matrix of a (2, 1, v) systematic feedback encoder is given by 


G fb (D) = 



where hi(D) and h 0 (D) are referred to as the feedforward and feedback polynomials, 
respectively, and ho(D ) is of degree u. Since only information sequences of weight 2 are 
being considered, the systematic output contributes weight 2 to the overall codeword 
weight for all the encoders being considered. Therefore, only the weight contributed by 


the parity sequence, that is 


y{D) = x(D) 


MD) 

ho(D)' 


needs to be maximized. Furthermore, since we are concerned only with the choice of 
ho(D), h\{D) is assumed to be a polynomial such that ho(D) and hi(D) are relatively 
prime. (There is empirical evidence that the choice of both polynomials can affect the 
performance of the code [29], [31], but we will not address that issue in this paper.) 
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Let 1 +jD^, for some finite K , be the shortest input sequence of weight 2 that generates 
a finite length codeword. The resultant parity sequence is 

= < 1 + D ' f >£i§y 

M P) , p^itP) 

Ao(D) fto(D)’ 

Since y(Z5) is of finite length, hi(D)/hQ(D) must be periodic with period K. Increasing 
the period K increases the length of the shortest weight 2 input sequence that generates 
a finite length codeword and therefore increases the length of that codeword. Intuitively, 
one would expect that increasing its length would result in the codeword gaining weight. 
That is, on average, half of the added bits would be ones. 

A strictly proper rational function of two polynomials, like h\{D)jh§{D), is periodic 
with period K < T — 1. The period is maximized, that is, K - T - 1, when ho(D) is a 
primitive polynomial. Since the free distance of an “average” Turbo code is determined by 
information sequences of weight 2, for sufficiently large interleavers the free distance will 
be maximized by maximizing K . Therefore, choosing h$(D) to be a primitive polynomial 
will result in a larger free distance for an “average” Turbo code. 

To test this, we compare a (37, 21, 400) Turbo code which has I\ = 5 and free distance 
dfree = 6 to a (23, 35, 400) Turbo code. Both codes are punctured as in [1]. The feedback 
polynomial h 0 = 23 in the second Turbo code is a primitive polynomial of degree v = 4 
and thus 1/ho has a period of K = 2 U — 1 = 15. The free distance of this Turbo code 
was found to be dj ree = 10 [13], [29], Figure 11 shows simulation results for these two 
codes using the iterative decoding algorithm of [1] with 18 iterations. As expected, the 
second Turbo code performs better at moderate and high SNR’s because its free distance 
asymptote is steeper due to the increased free distance. 


5 Conclusions 

The excellent performance of Turbo codes may be explained in terms of the distance 
spectrum of the code. The ‘error floor’ observed in simulations of Turbo codes is a 
manifestation of the free distance asymptote. Since Turbo codes have relatively low free 
distances the free distance asymptote has a shallow slope, and thus the performance 
curves flatten out at moderate to high SNR’s. The ‘error floor’ may be lowered by in- 
creasing the size of the interleaver for a fixed free distance, that is, by reducing the 
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effective multiplicity of the code. Alternatively, for fixed interleaver lengths, the perfor- 
mance may be improved for moderate and high SNR’s by increasing the free distance. 
Choosing primitive polynomials as the feedback polynomials in the constituent encoders 
usually results in an increased free distance. 

The exceptional performance of Turbo codes at low SNR’s is due to the sparse dis- 
tance spectrum and the resultant ability of the code to follow the free distance asymptote 
at moderate to low SNR’s. The use of systematic feedback encoders and pseudorandom 
interleavers results in spectral thinning, in which information sequences which gener- 
ate low weight parity sequences from the first constituent encoder are interleaved with 
high probability to information sequences that generate high weight parity sequences in 
the second constituent encoder. Spectral thinning is enhanced by increasing interleaver 
lengths. For very large interleavers, spectral thinning results in a sparse distance spec- 
trum in which the first several spectral lines are determined solely by input sequences 
of weight two. Thus, spectral thinning results in few low weight codewords and a large 
number of codewords of “average” weight. This is very similar to the type of distance 
spectrum achieved by “random-like” codes [4]. 

In a more philosophical light, Turbo codes remind us that information theoretical 
arguments imply that long block lengths, but not necessarily large free distances, are 
required to achieve capacity at moderate BER’s. Thus, like convolutional codes, Turbo 
codes are a class of codes that achieve long block lengths, but without the corresponding 
increased density of the distance spectrum common to convolutional codes, and for which 
a practical, albeit nontrivial, decoding algorithm exists. In addition, Turbo codes are 
time-varying due to the pseudorandom interleaver, and the time-varying structure is 
essential in achieving the distance spectrum that results in near capacity performance 
at moderate BER’s. This suggests that some effort should be made to find other classes 
of time-varying codes, and decoding algorithms, that have good distance spectra, rather 
than just large free distances. Finally, since, in fact, long block lengths are required to 
achieve near capacity performance at moderate BER’s, only modest coding gains will be 
achievable in systems that use relatively short block lengths. 
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Figure 1: Simulation results for a (37,21,65536) Turbo code and a (2,1,14) MFD con- 
volutional code. 
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Figure 2: Block diagram of a Turbo encoder with two identical constituent encoders 
( ho(D ) = 1 + D 2 ,hi(D) = D) and without puncturing. 
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Figure 3: Trellis diagrams for a codeword in the (1 + D 2 , ID, 16) Turbo code. 
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Figure 4: Simulation results for a (37,21,65536) Turbo code and a (2,1,14) MFD con- 
volutional code and the free distance asymptotes. 
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Figure 5: Simulation results for a (37, 21, N) Turbo code with varying interleaver size N 
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Figure 6: Simulation results and the free distance asymptote for the (37, 21, 14400) Turbo 
code with a 120 x 120 rectangular interleaver. 
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Figure 7: Portions of the interleaver and information sequences resulting in free distance 
codewords for a (37, 21,14400) Turbo code with a 120 x 120 rectangular interleaver. 
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Figure 8: Performance of the (37,21,65536) Turbo code decomposed by spectral line. 
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Figure 9: Performance of the (2,1, 14) MFD convolutional code decomposed by spectral 
line. 
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Figure 10: Graphical representation of spectral thinning for increasing interleaver sizi 
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Figure 11: Simulation results for a (37,21,400) Turbo code and a (23,34,400) primitive 
Turbo code. 
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